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Abstract. Filtering is a widely used methodology for the incorporation of observed 
data into time-evolving systems. It provides an online approach to state estimation 
inverse problems when data is acquired sequentially. The Kalman filter plays a central 
role in many applications because it is exact for linear systems subject to Gaussian 
noise, and because it forms the basis for many approximate filters which are used in 
high dimensional systems. The aim of this paper is to study the effect of model error on 
the Kalman filter, in the context of linear wave propagation problems. A consistency 
result is proved when no model error is present, showing recovery of the true signal in 
the large data limit. This result, however, is not robust: it is also proved that arbitrarily 
small model error can lead to inconsistent recovery of the signal in the large data limit. 
If the model error is in the form of a constant shift to the velocity, the filtering and 
smoothing distributions only recover a partial Fourier expansion, a phenomenon related 
to aliasing. On the other hand, for a class of wave velocity model errors which are 
time-dependent, it is possible to recover the filtering distribution exactly, but not the 
smoothing distribution. Numerical results are presented which corroborate the theory, 
and also to propose a computational approach which overcomes the inconsistency in 
the presence of model error, by relaxing the model. 
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1. Introduction 

Filtering is a methodology for the incorporation of data into time-evolving systems 
[H [2] . It provides an online approach to state estimation inverse problems when data is 
acquired sequentially. In its most general form the dynamics and/or observing system 
are subject to noise and the objective is to compute the probability distribution of the 
current state, given observations up to the current time, in a sequential fashion. The 
Kalman filter [3] carries out this process exactly for linear dynamical systems subject to 
additive Gaussian noise. A key aspect of filtering in many applications is to understand 
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the effect of model error - the mismatch between model used to filter and the source of 
the data itself. In this paper we undertake a study of the effect of model error on the 
Kalman filter in the context of linear wave problems. 

Section [2] is devoted to describing the linear wave problem of interest, and deriving 
the Kalman filter for it. The iterative formulae for the mean and covariance are solved 
and the equivalence (as measures) of the filtering distribution at different times is 
studied. In section [3] we study consistency of the filter, examining its behaviour in 
the large time limit as more and more observations are accumulated, at points which 
are equally spaced in time. It is shown that, in the absence of model error, the filtering 
distribution on the current state converges to a Dirac measure on the truth. However, 
for the linear advection equation, it is also shown that arbitrarily small model error, in 
the form of a shift to the wave velocity, destroys this property: the filtering distribution 
converges to a Dirac measure, but it is not centred on the truth. Thus the order of two 
operations, namely the successive incorporation of data and the limit of vanishing model 
error, cannot be switched; this means, practically, that small model error can induce 
order one errors in filters, even in the presence of large amounts of data. All of the 
results in section [3] apply to the smoothing distribution on the initial condition, as well 
as the filtering distribution on the current state. Section H] concerns non-autonomous 
systems, and the effect of model error. We study the linear advection equation in two 
dimensions with time-varying wave velocity. Two forms of model error are studied: an 
error in the wave velocity which is integrable in time, and a white noise error. In the 
first case it is shown that the filtering distribution converges to a Dirac measure on 
the truth, whilst the smoothing distribution converges to a Dirac measure which is in 
error, i.e., not centred on the truth. In the second, white noise, case both the filter 
and smoother converge to a Dirac measure which is in error. In section [5] we present 
numerical results which illustrate the theoretical results. We also describe a numerical 
approach which overcomes the effect of model error by relaxing the model to allow the 
wave velocity to be learnt from the data. 

We conclude the introduction with a brief review of the literature in this area. 
Filtering in high dimensional systems is important in a range of applications, especially 
within the atmospheric and geophysical sciences [H |5l El [7] . Recent theoretical studies 
have shown how the Kalman filter can not only systematically incorporate data into a 
model, but also stabilize model error arising from an unstable numerical discretization 
[8], from an inadequate choice of parameter P [10], and the effect of physical model 
error in a particular application is discussed in [11]. The Kalman filter is used as the 
basis for a number of approximate filters which are employed in nonlinear and non- 
Gaussian problems, and the ensemble Kalman filter in particular is widely used in this 
context [5l [121 [IS]- Although robust and widely useable, the ensemble Kalman filter 
does not provably reproduce the true distribution of the signal in the large ensemble 
limit, given data, except in the Gaussian case. For this reason it would be desirable to 
use the particle filter [2] on highly non-Gaussian systems. However, recent theoretical 
work and a range of computational experience shows that, in its current form, particle 
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filters will not work well in high dimensions [TU [T51 [IS]- As a consequence a great 
deal of research activity is aimed at the development of various approximation schemes 
within the filtering context; see [HI [181 [13 [2Ql [211 [22] for example. The subject of 
consistency of Bayesian estimators for noisily observed systems, which forms the core 
of our theoretical work in this paper, is an active area of research. In the infinite 
dimensional setting, much of this work is concerned with linear Gaussian systems, as we 
are here, but is primarily aimed at the perfect model scenario [231 [211 [25]. The issue of 
model error arising from spatial discretization, when filtering linear PDEs, is studied in 
[26] . The idea of relaxing the model and learning parameters in order to obtain a better 
fit to the data, considered in our numerical studies, is widely used in filtering (see the 
chapter by Kiinsch in [2], and the paper [27] for an application in data assimilation). 

2. Kalman Filter on Function Space 

2.1. Statistical model for discrete observations 

The test model, which we propose here, is a class of PDEs 

dtv + Cv = 0, V(x, t) eT^ X (0, oo) (1) 

on a two dimensional torus. Here C is an ant i- Hermit ian operator satisfying £* = —C 
where C* is the adjoint in L^(T^). Equation ([1]) describes a prototypical linear wave 
system and the advection equation with velocity c is the simplest example: 

dtV + c-Vv = 0, V(x,t) G X (0,cx)). (2) 

The state estimation problem for Equation ([T]) requires to find the 'best' estimate of the 
solution v{t) (shorthand for v{-,t)) given a random initial condition vq and a set of noisy 
observations, called data. Suppose the data is collected at discrete times t„ = nAt, then 
we assume that the entire w„ = v(tn) solution on the torus is observed with additive 
noise rjn at time tn, and further that the r]n are independent for different n. Realizations 
of the noise rjn are L^(T^)-valued random fields and the observations yn are given by 

yn = Vn + Vn = 6-'"% + Wx G (3) 

Here e"*'^ denotes the forward solution operator for through t time units. Let 
Yn = {yi, . . . ,yN} be the collection of data up to time tN, then we are interested in 
finding the conditional distribution F {vn\YN) on the Hilbert space L^(T^). If n = A^, 
this is called the filtering problem, if n < it is the smoothing problem and for n > N, 
the prediction problem. We here emphasize that all of the problems are equivalent for 
our deterministic system in that any one measure defines the other simply by a push 
forward under the linear map defined by Equation ([T]). 

In general calculation of the filtering distribution P(wn|yn) is computationally 
challenging when the state space for Vn is large, as it is here. A key idea is to estimate the 
signal Vn through sequential updates consisting of a two-step process: prediction by time 
evolution, and analysis through data assimilation. We first perform a one-step statistical 
prediction to obtain P {vn+i\Yn) from P {vn\Yn) through some forward operator. This is 
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followed by an analysis step which corrects the probability distribution on the basis of 
the statistical input of noisy observations of the system using Bayes rule: 

(xF{yn+i\vn+i) . (4) 

This relationship exploits the assumed independence of the observational noises r]n 
for different n. In our case, where the signal Vn is a function, this identity should 
be interpreted as providing the Radon-Nikodym derivative (density) of the measure 
F{dvn+i\Yn+i) with respect to P((if„+i|F„) [28] . 

In general implementation of this scheme is non-trivial as it requires approximation 
of the probability distributions at each step indexed by n. In the case of infinite 
dimensional dynamics this may be particularly challenging. However, for linear 
dynamical systems such as ([1]), together with linear observations ([3]) subject to Gaussian 
observation noise ?7„ this may be achieved by use of the Kalman filter [3]. Our 
work in this paper begins with Theorem I2.H which is a straightforward extension of 
the traditional Kalman filter theory in finite dimension to measures on an infinite 
dimensional function space. For reference, basic results concerning Gaussian measures 



required for this paper are gathered in Appendix A 



Theorem 2.1. Let vq be distributed according to a Gaussian measure fio = A/" (mo, Co) 
on L^(T^) and let {?7„,}„gN be i.i.d. draws from the (T^) -valued Gaussian measure 
A/'(0, r). Assume further that vq and {?7„,}neN independent of one another, and that 
Cq and r are strictly positive. Then the conditional distribution P(t;„|y„) = fi"' is a 
Gaussian Af {m"',C"') with mean and covariance satisfying the recurrence relations 



- e-^*^re-^*^* (r + e-^*^C"e-^*^*)"' (e'^^^m" - y^+i) , (5a) 

- e-^'Te-^'^* (r + e-^'^re-^'^'Y^ e-^^^C'^e"^*^*, (5b) 
where Wb^ = mo, = Cq. 

Proof. Let mn\N = IE(^;„|Y/v) and Cn\N = E [(t>„ — m„|Ar)(f„ — m„|Ar)*] denote the mean 
and covariance operator of F{vn\YN) so that m"^ = mn\n, = Cn\n- Now the prediction 
step reads 

m„+i|„ = E(e~^*^i;„|y;) = e~^*^m„|„, 

Cn+Hn = E [C'^^^ivn - m„|n)(Wn - "^n|n) *e"'^*^* ] 

= e-^*^C„|„e"^*^*. (6) 

To get the analysis step, choose xi = Vn+i\Yn and X2 = yn+i\Yn then (xi,X2) is jointly 
Gaussian with mean (m„+i|„, m„_|_i|„) and each components of the covariance operator 
for {xi,X2) are given by 

Cii = E [{xi - m„+i|„,)(xi - m„,+i|„)*] = Cn+i\n, 
C22 = E [{x2 - m„+i|„,)(x2 - m„,+i|„)*] = T + C„+i|„,, 



C21- 
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Ci2 = E [{xi - m„+i|„)(a;2 - m„+i|„)*] = 
Using Lemma [A. 2 1 we obtain 

(r + C„+i|„) (m„+i|„ - yn+ij 

Cn+l|n+l ^n+l\n ^n+l\n (T + Cn+l\n) ^n+l\n- 

Combining Equations (j6]) and ([7]) yields Equation ([5]). 



(7) 
□ 



Note that the distribution of t'ol^n., is also Gaussian and we denote its mean 
and covariance by m„ and Cn respectively. The measure /i„ is the image of /i" under 
the linear transformation e*'"^. Thus we have m„ = e^'^'^m" and C„, = e*"'^C"e*"'^ . 

In this paper we study a wave propagation problem for which C is anti-Hermitian. 
Furthermore we assume that both Co and F commute with C Then the formulae ([5]) 
simplify to give 



^n+i ^ ^-Mc^n _ (F + (e^^*^m" - Vn+i) , (8a) 

The following gives explicit expressions for {mn,Cn) and (m",C"), based on the 
Equations ([8]). 

Corollary 2.2. Suppose that Co and F commute with the anti-Hermitian operator C, 
then the means and covariance operators of fin o-nd /i" are given by 

n-l 



rUn = [nl + FCq ^) ^ 



FCo-^mo + 5^e*'+iS+i 



i=0 



-1 



(9a) 
(9b) 



Cn = {nT-'+C,'y 
and = e~*"^mn, C^ = Cn- 

Proof. Assume for induction that is invertible. Then the identity 
(F-^ + (C")~^) (C" - C" (F + C")"^ C") 
= (F-i + (C")"^) (^C" -C" (C")~^ (F-i + (C")"^)"V-i 

leads to (C"+^)-^ = F"^ + (C")~^ from Equation (Jib]), and hence C"+^ is invertible. 
Then Equation (I9bl) follows by induction. By applying e*"+^''' to Equation (l8all we have 

(F + C")"^ (m„ - e*"+^Sn+i) • 



After using C"(F + C")"^ = ((n + 1)/ + FCq"^) ^ from Equation (l9b|, we obtain the 
telescoping series 

{{n + 1)1 + FCq-') m„,+i = (nJ + FCo"') m„ + e*"+^^2/„H_i 

and Equation (!9a|) follows. The final observations follow since m„ = e*"'^m" and 
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2.2. Equivalence of measures 

Now suppose we have a set of specific data and they are a single reahzation of the 
observation process ([3]), 

yniuj) = e~'"^u + r],,{uj). (10) 

Here u is an element of the probability space Q generating the entire noise signal {r/njneN- 
We assume that the initial condition u G L^(T^) for the true signal e~*"'^u is non- 
random and hence independent of /iq. We insert the fixed (non-random) instances of 
the data ( fTOl) into the formulae for yU-o, fin and and will prove all three measures are 
equivalent. Recall that two measures are said to be equivalent if they are mutually 
absolutely continuous, and singular if they are concentrated on disjoint sets [29] . 
Lemma [A. 31 (Feldman-Hajek) tells us the conditions under which two Gaussian measures 
are equivalent. 

Before stating the theorem, we need to introduce some notation and assumptions. 
Let 0fc(x) = e^'^*^'^, where k = {ki, k2) G Z x Z = K, be a standard orthonormal basis 
for L^(T^) with respect to the standard inner product (/,(?) = /rp2 fgdxdy where the 
upper bar denotes the complex conjugate. 

Assumptions 2.3. The operators C,Co and T are diagonalizable in the basis defined by 
the (pk- The two covariance operators, Cq and T, have positive eigenvalues \k > and 
7fc > respectively: 

r0fc = 7fc0fc- (11) 
The eigenvalues ik of C have zero real parts and £0o = 0. 

This assumption on the simultaneous diagonalization of the operators implies the 
commutativity of Cq and F with C Therefore we have Corollary 12.21 and Equations ([9]) 
can be used to study the large n behaviour of the and We believe that it might 
be possible to obtain the subsequent results without Assumption 12. 3[ but to do so would 
require significantly different techniques of analysis; the simultaneously diagonalizable 
case allows a straightforward analysis in which the mechanisms giving rise to the results 
are easily understood. 

Note that, since Cq and F are the covariance operators of Gaussian measures on 
L^(T^), it follows from Lemma lA. II that the A^, 7^ are summable: i.e., J2keK^k < 00, 
XlfcGK^fc ^ define H^{T^) to be the Sobolev space of periodic functions with s 

weak derivatives and 

imIh^(t^)= E \kn{;<i>k)\'+\{-Ao)\' 

fceiK+ 

where = ]K\{(0, 0)} noting that this norm reduces to the usual norm when s = 0. 
We denote by ||-|| the standard Euclidean norm. 

Theorem 2.4. IfYlkeK^^/'^k < then the Gaussian measures fiQ, fin o-nd on the 
Hubert space L^(T^) are equivalent rj—a.s. 



Filtering for linear wave equations 



7 



Proof. We first show the equivalence between /iq and /i^. For h = 'Yl,keK^k(t>k we get, 
from Equations f l9bp and f|TT]) . 

1 ^ {h.Cnh) _ EkeK{n%' + KT'hl 



— < 



<1, 



c+ {h,Coh) EfcGK^fc^fc 

where c+ = sup^jg^ {nXk/'jk + !)• We have c"*" G [1, oo) because J^keK'^k/ll < oo and F 
is trace-class. Then the first condition for Feldman-Hajek is satisfied by Lemma [A.4I 

For the second condition, take {g^^^j^gK where / = 0, . . . , — 1, to be a sequence 
of complex- valued unit Gaussians independent except for the condition g^jt = gj.. This 
constraint ensures that the Karhunen-Loeve expansion 



fceK 



is distributed according to the real- valued Gaussian measure A/'(0,F), and is 
independent for different values of /. Thus 



mo 



Co 



Cn 



[nir, 



fceK 



A 



moj 



^1/2 



L2{T2) 
2 



n + 7fc/ Afc 
A. 



-n(mo, + ^(m, 0fc) + y/lk 



n-l 
/=0 



< C{n) ^ < oo, 



fcGK 



where 



C{n) 



sup 

fceK 



n-l 



k 



1=0 



< oo 



77— a.s. from the strong law of large numbers [30] . 

For the third condition, we need to show the set of eigenvalues of T, where 

T(f)k = (Cn ^CoCn ^ - (pk = n 

are square-summable. This is satisfied since Co is trace-class: 

< 00. 



$:|<(supA.)5:^ 



fceK ' fcgK 

The equivalence between /io and is immediate because is the image of m„ 
under a unitary map e"*""^, and C" = C„. □ 

To illustrate the conditions of the theorem, let (—A) denote the negative Laplacian 
with domain I'(-A) = if^^T^). Assume that Co oc (-A /c^J)"^ and F oc 
(—A + ksl)'^ , where the conditions A > 1, B > 1 and /ca, /cb > ensure, respectively, 
that the two operators are trace-class and positive-definite. Then the condition 
Z]fceK-^fc/7fc < °° reduces to 2B < A - 1. 
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3. Measure Consistency and Time-Independent Model Error 

In this section we study the large data limit of the smoother fin and the filter /i*^ in 
Corollary [22] for large n. We study measure consistency, namely the question of whether 
the filtering or smoothing distribution converges to a Dirac measure on the true signal 
as n increases. We study situations where the data is generated as a single realization 
of the statistical model itself, so there is no model error, and situations where model 
error is present. 



3.1. Data model 

Suppose that the true signal, denoted v'^ = v'{-,tn), can be different from the solution 
Vn computed via the model and instead solves 

dtv' + C'v' = 0, v'{0) = u (12) 

for another anti-Hermitian operator C on L^(T^) and fixed u G L^(T^). We further 
assume possible error in the observational noise model so that the actual noise in the 
data is not but r/^. Then, instead of Un given by ffTOj) . what we actually incorporate 
into the filter is the true data, a single realization y'^ determined by v'^ and rj'^ as follows: 

y'n = < + Tl'n = e"*"""'^ + in- (13) 

We again use e"*"^' to denote the forward solution operator, now for (fT2|) . through t time 
units. Note that each realization ( IT3l) is an element in the probability space which is 
independent of VL. For the data = {y[, . . . , y'^}, let fi'^ be the measure P (t'ol^n = ^n)- 
This measure is Gaussian and is determined by the mean in Equation (l9al) with yi 
replaced by y'l, which we relabel as m^, and the covariance operator in Equation (19bp 
which does not depend on the data so we retain the notation C„. Clearly, using ( fT3l) we 
obtain 



n-l 



1=0 



(14) 



where ttLq = niQ. 

This conditioned mean differs from m„ in that e^^+^'^^~^'^u and 77^^^ appear 
instead of u and r/i+i, respectively. The reader will readily generalize both the statement 
and proof of Theorem 12. 41 to show the equivalence of /xq, /^^ and (/x')" = P(^n|^n = Yn) = 
M ((m')", C") , showing the well-definedness of these conditional distributions even with 
errors in both forward model and observational noise. We now study the large data 
limit for the filtering problem, in the idealized scenario where C = CJ (Theorem 13. 2p 
and the more realistic scenario with model error so that L ^ CJ (Theorem 13. 7p . We 
allow possible model error in the observation noise for both theorems, so that the noises 
are draws from i.i.d. Gaussians 77^ ~ A^(0, F') and F'0fc = 7[.0fe. Even though 
7^ (equivalently F') and CJ are not exactly known in most practical situations, their 
asymptotics can be predicted within a certain degree of accuracy. Therefore, without 
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limiting the applicability of the theory, the following are assumed in all subsequent 
theorems and corollaries in this paper: 

Assumptions 3.1. There are positive real numbers s, /t G M"*" such that: 

(^^) lk/Xk = 0{\kn; 
(m) mo,ue H'+'^{T^). 

Then Assumptions [3ll](l) imply that r] ~ A/'(0, T) and r]' ~ A/'(0, V) are in H'{T^) 
since E||?7||^s(x2) < oo and E||?7'||^s^^2) < oo. 

3.2. Limit of the conditioned measure without model error 

We first study the large data limit of the measure /i^ without model error. 

Theorem 3.2. For the statistical model (Qp and suppose that the data = 

{y'ly ■ ■ ■ yU'ri} Created from ( fi^) with C = CJ . Then, as n ^ oo, E(t>o|F^) = m'^ — )■ u 
in the sense that 

~ '"llL2(n';_ff=(T2)) = ^ (^^ (15a) 

W'^'n ~ '^\\h''{t'^) ^ ^ {^~^) ^' — as., (15b) 

for the probability space Q' generating the true observation noise {?7^}„gNj (I'nd for any 
non-negative 9 < 1/2. Furthermore, Cn ^ in the sense that its operator norm from 
L2(T2) to H'{T^) satisfies 

ll^n|lL{L2(T2);/i'«{T2)) = ^('^ (16) 

Proof. From Equation ( fT4l) with C = C, we have 

n-l 

{nl + rCo~^) m; = rCo-^mo + nu + J2 

/=o 

thus 

n-l 

[nl + rCo"^) (m'„ - n) = rC,\mo - n) + ^ e^'+^^r/^+i. 

(=0 

Take {(g^)'"'"^}fcgK where Z = 0, . . . , n — 1, to be an i.i.d. sequence of complex-valued 
unit Gaussians subject to the constraint that (g'_/j)' = (g^)^- Then the Karhunen-Loeve 
expansion for e*^+^^ri[j^^ ~ M (0, V) is given by 

^''^''^Wi = E v^(g^)'^v.. (17) 

fcGK 

It follows that 

11/ ||2 11711' 11^ 

ll"^n ~ '^\\l'^{W;H=(T'^)) ~ ll"^n ~ '"ll/i's(T2) 

= 5^ |A;|2^E|(m;-M,0fc)|2 + E|(m;-M,0o)r 

fegIK+ 
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fcGK+ 



^) |(mo + 7fc'^ 



+ 



(n + 7o/Ao) 



— ) |(mo -M,0o)|^ + 7o^ 



< 



fceK^ 



n 



^) |(mo-M,0o)r + 7o^ 



< ||mo — u\ 



n 



\keK J 



and so we have Equation (115ap . Here and throughout the paper, C is a constant that 
may change from hne to hne. 

Equation fll5bl) follows from the Borel-Cantelli Lemma [30]: for arbitrary e > 0, we 

have 



n6N 



2re 



2r0 ||_/ „,||2r ^ ,2r 



— ~2r^''^ ll"^"' ~ '"IIh'=(t2) (Markov inequality) 
< ^Cn^''^ (^E||m;,-u||5^.(T2))" (Lemma ISH) 



n6N 



c 



^J2^;^<^ (bydSID) 



n6N 



and if ^ G (0, 1/2) then we can choose r such that r(l — 26*) > 1. 
Finally, for /i = EfceK ^fc'/'fc, 



IC I 



L(L2(T2);/fs(T2)) 



— sup ||Cn/l|| J|^s|-rp2) 

l|/l|li2(T2)<l 

sup XI 1^1^' I'^^fc^ + ^fcT^ l^fcl^ 

ll't|li2(T2)<l fcgK 

<C5^|A;p^|n7,-^ + A, 



-1 , \-l|-2 



fcGK 



and use the fact that sup^.g]jj7fc < oo, since F is trace-class, together with 
Assumptions 13.1( 1) to get the desired convergence rate. □ 



Corollary 3.3. Under the same assumptions as Theorem \3.2[ if Equation ( fi56j) holds 
with s > 1, then 



"n "'IIL°°(T2) 



a.s., 
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for any non-negative 9 < 1/2. 

Proof. Equation ( |T8l) is immediate from Equation (115bl) and the Sobolev embedding, 



l/i'»(T2) ; 

when s > d/2 = 1 since d = 2 is the dimension of the domain. 



□ 



Corollary 3.4. Under the same assumptions as Theorem \3.2\ as n oo, the VL' — a.s. 

weak convergence fi'^ =^ 6u holds in L^(T^). 



Proof. To prove this result we apply Example 3.8.15 in [31]. This shows that for 
weak convergence of /i^ = A/'(m^,C„) a Gaussian measure on H to a limit measure 
fi = M (m, C) on "H, it suffices to show that — > m in that C„ — j- C in L('H, Ti) and 
that second moments converge. Note, also, that Dirac measures, and more generally 
semi-definite covariance operators, are included in the definition of Gaussian. The 
convergence of the means and the covariance operators follows from Equations fll5bp 
and fll6p with m = u, C = and "H = L^(T^). The convergence of the second comments 
follows if the trace of C„ converges to zero. From (\9h\i it follows that the trace of C„ is 
bounded by multiplied by the trace of F. But F is trace-class as it is a covariance 
operator on L^(T^) and so the desired result follows. □ 

In fact, the weak convergence in the Prokhorov metric between /i^ and 6u holds, 
and the methodology in [23] could be used to quantify its rate of convergence. 

We now obtain the large data limit of the filtering distribution (/i')" without 
model error from the smoothing limit f lT^ . Recall that this measure is the Gaussian 
Ar((m')",C„). 

Theorem 3.5. Under the same assumptions as Theorem \3.^ asn oo, (m')"- 
in the sense that 



[m 



[m 



V. 



nllL2{n';_H'=(T2)) 



0(n 



for any non-negative 9 < 1/2. 
Proof. Equation fll9bl) follows from 



(n^^) n'-a.s., 



(19a) 
(19b) 



m 



V. 



nll_ff»(T2) 



-t„C I 

e m„ 



—f c 



_ff=(T2) 



|e '"•^ im' - u) 



l/f»(T2) 



I 



L(H»(T2);Hs(T2)) W^n ^IIh''(T2) 



Then Equation fll9al) follows from 

./ I|2 



m 



'^nllL2(n';i/«(T2)) — E||(m' 



V. 



JiIIh»(T2) 



<E||m^-n||^,(T2) 



□ 
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The following corollary has the same as for Corollary 13. 4^ and so we omit it. 

Corollary 3.6. Under the same assumptions as Theorem \3.^ as n —> oo, the Q' — a.s. 

weak convergence (yu')" — 5^'^ ^ Q holds in L^(T^). 

Theorem 13.51 and Corollary 13.61 are consistent with known results concerning 
large data limits of the finite dimensional Kalman filter shown in [32]. However, 
Equations (fT9l) and ( |T6l) provide convergence rates in an infinite dimensional space 
and hence cannot be derived from the finite dimensional theory; mathematically this is 
because the rate of convergence in each Fourier mode will depend on the wavenumber 
k, and the infinite dimensional analysis requires this dependence to be tracked and 
quantified, as we do here. 

3. 3. Limit of the conditioned measure with time-independent model error 

The previous subsection shows measure consistency results for data generated by the 
same PDE as that used in the filtering model. In this section, we study the consequences 
of using data generated by a different PDE. It is important to point out that, in view 
of Equation (fT4l) . the limit of m'^ is determined by the time average of e*'+i*^'^~'^'-'M, i.e.. 



For general anti-Hermitian C and obtaining an analytic expression for the limit of 
the average (l20l) . as n — oo, is very hard. Therefore in the remainder of the section 
we examine the case in which £ = c ■ V and £' = c' ■ V with different constant wave 
velocities c and c', respectively. A highly nontrivial filter divergence takes place even in 
this simple example. 

We use the notation J'{j,^q)f = '^(^ki/p k2/q)£Zxzif'> 4'k)(pk for P&rt of the Fourier series 
of / G (T^), and (/) = (/, 0o) = /t2 f{^^ u) dxdy for the spatial average of / on the 
torus. We also denote hy 5c = c — d the difference between wave velocities. 

Theorem 3.7. For the statistical model (QP and ^ with £ = c ■ V, suppose that the 
data = {y[, ■ ■ • , y'^} is created from ^^) with C = c' ■ V and that 6c ^ Omod(l, 1) 
(equivalently Sc ^ Z x Z). As n ^ oo, 

(i) if At 6c = {p'/p,q'/q) G Q x Q and gcd(j)',p) = gcd{q',q) = 1, then m'^ — t- J^{p,q)U 
in the sense that 




(20) 




(21a) 
(21b) 



for any non-negative 9 < 1/2; 
(a) if At 6c e ]R\Q X M\Q, then m'^ — (n) in the sense that 

W^'n - (^)llL2(n';H»(T2)) = O (1) , 



(22a) 
(22b) 
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Proof. See Appendix B □ 



Remark 3.8. It is interesting that it is not the size of At 6c but its rationality or 
irrationality which determines the hmit of m^. This result may be understood intuitively 
from Equation fl20|) . which reduces to 

^ n— 1 

-yn(- + (/ + l)At5c), (23) 



n 

1=0 



when £ = c ■ V and = c' ■ V. It is possible to guess the large n behaviour of 
Equation ( 123|) using the periodicity of u and an ergodicity argument. The proof of 
Theorem 13.71 tells us that the prediction resulting from this heuristic is indeed correct. 

The proof of Corollary 13.31 tells us that, whenever s > 1, the H"^ (T^) —norm 
convergence in (121b|) or (]22bp implies the almost everywhere convergence on with the 
same order. Therefore, — )■ J-'(p or — (u) a.e. on from Equation (121bl) or 



from Equation (I22bp . when At 5c = {p' /p, q' /q) G Q x Q and gcd(p',p) = gcd(g', q) = 1, 
or when AtSc G M\Q x M\Q, respectively. We will not repeat the statement of the 
corresponding result in the subsequent theorems. 



When At 5c E M\Q x R\Q, Equation ([221) is obtained using 

^ n— 1 

iy-g2.i(fc.5c)t,+i _ ^24) 

n ' ^ 



1=0 

as — )■ oo, from the theory of ergodicity [33]. The convergence rate of Equation (l22l) 
can be improved if we have higher order convergence in Equation flMj) . It must be noted 
that in general there exists a fundamental relationship between the limits of and the 
left-hand side of Equation (l24l) for various c and c', as we will see. 

Note also from Theorem 13.21 and Theorem 13.71 the limit of does not depend 
on the observation noise error T ^ V but does depend sensitively on the model error 



Corollary 3.9. Under the same assumptions as Theorem \3. 7\ as n ^ oo, the il' — a.s. 

weak convergence /i^ =^ ^^{pq)^ l^'n ^ ^{u) holds, when At 6c = {p'/p,q'/q) G Q x Q 
and gcd(p',p) = gcd(g', q) = 1, or when At 6c G M\Q x M\Q, respectively. 

Proof. This is the same as the proof of Corollary 13.41 so we omit it. □ 

Remark 3.10. Our theorems show that the observation accumulation limit and the 
vanishing model error limit cannot be switched, i.e., 

lim lim llm' — nil rrw^.2^ =0, 
lim lim - n|| ^.,,^,2^ 7^0. 

Note the second limit is nonzero because converges either to J^i^p^q)U or (m). □ 

We can also study the effect of model error on the filtering distribution, instead of 
the smoothing distribution. The following theorem extends Theorem 13.71 to study the 
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measure (/i')*^, showing that the truth is not recovered if Sc 7^ 0. This result should 
be compared with Theorem 13.51 in the case of no model error. 

Theorem 3.11. Under the same assumptions as Theorem \3. 7\ as n ^ 00, 

(i) if At 6c = {p'/p,q'/q) G Q x Q and gcd{p',p) = gcd(g',g) = 1, then (m')" — 
J-'(p^g)e~*"'^M — 7- m the sense that 

II (m')" - J^{p,,)e~'"'^u\\LHn';HsiT2)) = O (^-^j , (25a) 
II (m')" - J-(p,,)e-*"^n||^,(^,^ = o (n"^) n' - a.s., (25b) 

for any non-negative 6 < 1/2; 

(ii) if At 5c G M\Q X M\Q, then {m'Y {u) in the sense that 

II (m')" - {u) II i2(f^,.H.(T2)) =0(1), (26a) 
ll(m')"- (n) 11^,(^2) =0(1) Q'-a.s. (26b) 

Proof. This is the same as the proof of Theorem 13.51 except g)e~*"'^n or (u) is used 
in place of w^, so we omit it. □ 

Corollary 3.12. Under the same assumptions as Theorem yj. 1\ as n ^ 00, the Q' — a.s. 

weak convergence (/i')" — ^t^^, ^^e-^^-^u =^ or (/i')" — <^(u) =^ holds, when At 6c = 
{p'/p,q'/q) G Q X Q and gcd{p',p) = gcd(g',g) = 1, or when At 6c G R\Q x R\Q, 
respectively. 

Proof. This is the same as the proof of Corollary 13.41 so we omit it. □ 

Theorem 13.21 and Theorem 13.51 show that, in the perfect model scenario, the 
smoothing distribution on the initial condition and filtering distribution recover the true 
initial condition and true signal, respectively, even if the statistical model fails to capture 
the genuine covariance structure of the data. Theorem 13.71 and Theorem 13.111 show that 
the smoothing distribution on the initial condition, and the filtering distribution, do not 
converge to the truth, in the large data limit, when the model error corresponds to a 
constant shift in wave velocity, however small. In this case the wave velocity difference 
causes an advection in Equation (l20|) leading to recovery of only part of the Fourier 
expansion of m as a limit of m'^. The next section concerns time-dependent model error 
in the wave velocity, and includes a situation intermediate between those considered in 
this section. In particular, a situation where the smoothing distribution is not recovered 
correctly, but the filtering distribution is. 



4. Time-Dependent Model Error 

In the previous section we studied model error for autonomous problems where the 
operators C and £' (and hence c and c') are assumed time-independent. However, our 
approach can be generalized to situations where both operators are time-dependent: 
C{t) = c(t) ■ V and C(t) = c'{t) ■ V. To this end, this section is devoted two problems 
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where both the statistical model and the data are generated by the non-autonomous 
dynamics. In the first case deterministic dynamics with c{t) — dit) — )■ (Theorem I4.ip : 
and in the second case where the data is generated by the non-autonomous random 
dynamics with w') = c'(t; a;') ■ V and c'(t; w') being a random function fluctuating 
around c (Theorem I4.3p . Here lo' denotes an element in the probability space that 
generates c'(t; cj') and 77^, assumed independent. 
Now the statistical model ([1]) becomes 

dtv + c(t) ■ Vi; = 0, v{x, 0) = vq{x) (27) 

Unless c(t) is constant in time, the operator e^*^ is not a semigroup operator but for 
convenience we still employ this notation to represent the forward solution operator from 
time to time t even for non-autonomous dynamics. Then the solution of Equation (1271) 
is denoted by 

v{x,t) = (e~*^t'o) (x) = Vo ~ J ^^^^ 

This will correspond to a classical solution if vq G C-'^(T^,]R) and c G C(M+,]R^); 
otherwise it will be a weak solution. Similarly, the notation e"*"^' will be used for the 
forward solution operator from time to time t given the non-autonomous deterministic 
or random dynamics, i.e.. Equation (fT2|) becomes 

dtv' + c'(t; oj') ■ Vv' = 0, v\0) = u (28) 

and we define the solution of Equation (l28l) by 

v'{x, t] u') = ^e~*^ (x) = u ^ — J c{s; u') ds 

under the assumption that c'{s; to') ds is well-defined. We will be particularly 
interested in the case where c'{t;Lj') is an affine function of a Brownian white noise 
and then this expression corresponds to the Stratonovich solution of the PDE ( 128|1 
Note the term d''-+'^'^^~^'^u in Equation (IT^ should be interpreted as 



^ti+^{C-c')^ (x) = n ^a; + j (c(s) - c'(s; uj')) ds 





We now study the case where both c{t) and c'{t) are deterministic time-dependent 
wave velocities. We here exhibit an intermediate situation between the two previously 
examined cases where the smoothing distribution is not recovered correctly, but the 
filtering distribution is. This occurs when the wave velocity is time- dependent but 
converges in time to the true wave velocity, i.e., 5c{t) = c{t) — c'{t) — 0, which is of 
interest especially in view of Remark 13.101 Let = u {■ + a) be the translation of u by 
a. 

Theorem 4.1. For the statistical model (Qp and suppose that the data = 
{yi,---,yn} is created from [T^) with C{t) = c{t) ■ V and C'{t) = c'{t) ■ V where 
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6c{t) = c(t) — c'{t) satisfies 5c{s) ds = a + O {t ^) . Then, as n oo, m'^ — )■ Ua in 
the sense that 



'(t2) = o (n ^) fi' — a.s., 
for (f) = 1/2 A /3 anc? /or any non-negative 6 < 4>. 



Proof. See Appendix B 



(29a) 
(29b) 

□ 



Theorem 4.2. Under the same assumptions as Theorem \4.1\ ifu is Lipschitz continuous 
in H^{T'^) where s is given in Assumptions \3. 1\ then (m')" — i;^ — m the sense that 

O (n^-^) , (30a) 

(30b) 



(mT - v' 



(mT - vL 



|ji^s(T2) = o[n ) Vl' - a.s. 
for (f) = 1/2 A /3 and for any non-negative 6 < (p. 



Proof. Equation ( I30bl) follows from 



'*"^(m; - Ua) + ( e 



-t,^C 



Ur 



^71 

a — I c(s) (is 
'o 



c'fs) (is 



//»(T2) 



< \\m'„ 



a 



(5c(s) ds 



and Equation fl30ap follows from 

/ l|2 



II (m' 



-'nllL2(Q';_ff»(T2)) 



Ell Cm' 



nllH»(T2) 



E 



e (m^ - ita) + ( e" ""-n^ 



< 2 e 



+ 2 



-t £11 2 



L(_ff=(T2);_ff»(T2)) 

c(s) (is 



E llml 



—t r' 
e "■^ u 



Un 



2 

//=(T2) 



a||_H'«(T2) 

c'(s) (is 



H''(T2) 



□ 



Finally, we study the case where c(t) is deterministic but c'(t; u') is a random 
process. We here note that while the true signal solves a linear SPDE with multiplicative 
noise, Equation ( l28l) . the statistical model used to filter is a linear deterministic PDE, 
Equation (1271) . We study the specific case c'(t; u') = c{t) —eW(t), i.e., the deterministic 
wave velocity is modulated by a white noise with small amplitude e > 0. 
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Theorem 4.3. For the statistical model (Q]) and suppose that the data = 
{y[, ■ ■ ■ , y'^} is created from [T^) with C{t) = c{t) ■ V and C'{t] u') = c'(t; u') • V where 
d {s\ uj') ds = j^c{t) ds—eW{t) andeW{t) is the Wiener process with amplitude e > Q . 
Then, as n ^ oo, m'„ (u) in the sense that 



\\m'n-{u)\\L2(n';H^(T^)) = 0i^n 2j, (31a) 

- (^)IIh''(t2) = ^ ("'"^) ^'-a.s., (31b) 
for any non-negative 9 < 1/2. 



Proof. See Appendix B □ 



We do not state the corresponding theorem on the mean of the fihering distribution 
as (m')" converges to the same constant (u) with the same rate shown in Equations ( 13T]) . 

Remark 4.4. In Theorem l4.3l the law of eW{t) converges weakly to a uniform distribution 
on the torus by the Levy-Cramer continuity theorem [30]. The limit (u) is the average 
of u with respect to this measure. This result may be generalized to consider the 
case where (c(s) — c'{s;uj')) ds converges weakly to the measure as t — ?■ oo. Then 
m^(-) — )■ J u{- + y) di'{y) as n — )■ cxd in the same norms as used in Theorem 14.31 but, in 
general, with no rate. The key fact underlying the result is 

i^g2.a../,^^(c(s)-cW))^^_ u{- + y)du{y) = o{l) n' - a.s., (32) 



as n — oo, which follows from the strong law of large numbers and the Kolmogorov zero- 
one law [30]. Depending on the process (c(s) — c'(s; u')) ds, an algebraic convergence 
rate can be determined if we have higher order convergence result for the corresponding 
Equation ( l32i) . This viewpoint may be used to provide a common framework for all the 
limit theorems in this paper. 



5. Numerical Illustrations 

5.1. Algorithmic Setting 

The purpose of this section is twofold: first to illustrate the preceding theorems with 
numerical experiments; and secondly to show that relaxing the statistical model can 
avoid some of the lack of consistency problems that the theorems highlight. All of the 
numerical results we describe are based on using the Equations ([1]), (I3]), with £ = c- V for 
some constant wave velocity c, so that the underlying dynamics is given by Equation (jj]). 
The data is generated by (fT2|) . (fT3|) with £' = c'(t) ■ V, for a variety of choices of c'(t) 
(possibly random), and in subsection 15.21 we illustrate Theorems 13.21 13.71 14.11 and 14.31 
In subsection 15.31 we will also describe a numerical method in which the state of the 
system and the wave velocity are learnt by combining the data and statistical model. 
Since this problem is inherently non-Gaussian we adopt from the outset a Bayesian 
approach which coincides with the (Gaussian) filtering or smoothing approach when the 



Filtering for linear wave equations 



18 



wave velocity is fixed, but is sufficiently general to also allow for the wave velocity to be 
part of the unknown state of the system. In both cases we apply functionspace MCMC 
methods [35] to sample the distribution of interest. Note, however, that the purpose 
of this section is not to determine the most efficient numerical methods, but rather to 
study the properties of the statistical distributions of interest. 

For fixed wave velocity c the statistical model ([1]), ([3]) defines a probability 
distribution P(t'o, ^n|c). This is a Gaussian distribution and the conditional distribution 
^{'Vo\Yn,c) is given by the measure /i^ = Nijnn.Cn) studied in sections [2], [3] and HI 
In our ffist set of numerical results, in subsection 15. 2[ the wave velocity is considered 
known. We sample P(fo|^n,c) using the functionspace random walk from [36]. In 
the second set of results, in subsection 15.31 the wave velocity is considered as an 
unknown constant. If we place a prior measure p(c) on the wave velocity then we 
may define P(c, fo,F„) = P(t>o, F„|c)p(c). We are then interested in the conditional 
distribution P(c, fo|l^n) which is non-Gaussian. We adopt a Metropolis-within-Gibbs 
approach [371 EH] in which we sample alternately from P(vo|c, F„), which we do as in 
subsection 15. 2[ and ¥{c\vQ,Yn), which we sample using a finite dimensional Metropolis- 
Hastings method. 

Throughout the numerical simulations we represent the solution of the wave 
equation on a grid of 2^ x 2^ points, and observations are also taken on this grid. The 
observational noise is white (uncorrelated) with variance cr^ = 10~^ at each grid point. 
The continuum limit of such a covariance operator does not satisfy Assumptions 13. H 
but is used to illustrate the fact that the theoretical results can be generalized to such 
observations. Note also that the numerical results are performed with model error so 
that the aforementioned distributions are sampled with Yn = Y^ from (fT2l) . (fT3l) . 



5.2. Sampling the initial condition with model error 



Throughout we use the wave velocity 
c= (-0.5,-1.0), 

in our statistical model. The true initial condition used to generate the data is 



(33) 



U[Xi,X2) 



(34) 



This function is displayed in Figure l(a' 



sin(27rA;iXi) + cos(27rA;2a;2). 

fcl,fc2 = l 

As prior on vq we choose the Gaussian 
-A)~^) where the domain of —A is if^(T^) with constants removed, so that 
it is positive. We implement the MCMC method to sample from P(fo|c, y„ = Y^) for 
a number of different data Y^, corresponding to different choices of c' = c'{t,u'). We 
calculate the empirical mean of P(fo|c, Yn = Y^), which approximates E(fo|c, Yn = Yn). 



The results are shown in Figures 1(b) - 1(f) In all cases the Markov chain is burnt in for 
10^ iterations, and this transient part of the simulation is not used to compute draws 
from the conditioned measure P(wo|c, Yn = Y^). After the burn in we proceed to iterate 
a further lO'' times and use this information to compute the corresponding moments. 
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The data size n is chosen sufficiently large that this distribution is approximately a 
Dirac measure. 

In the perfect model scenario (c = c'), the empirical mean shown in Figure 1(b) 



should fully recover the true initial condition u from Theorem 13.21 Comparison with 
Figure 1(a) shows that this is indeed the case, illustrating Corollary 13.31 We now 



demonstrate the effect of model error in the form of a constant shift in the wave velocity: 
Figure 1(c) and Figure 1(d) show the empirical means when c — c' = (1/2, 1/2) G QxQ 
and c — c' = (1/e, I/tt) G M\Q x M\Q, respectively. From Theorem 13.71 the computed 
empirical distribution should be close to, respectively, J^{2,2)U comprising only the mode 
(/ci, = (2, 2) from (IMll . or (n) = 0; this is indeed the case. 

If we choose c'{t) satisfying (c — c'(s)) ds = (1/2, 1/2), then Theorem 14.11 tells 



us that Figure 1(e) should be close to a shift of m by (1/2, 1/2), and this is exactly what 
we observe. In this case, we know from Theorem 14.21 that although the smoother is in 
error, the ffiter should correctly recover the true v'^ for large n. To illustrate this we 
compute ||E(i;„|c, = Y^) — f^||L2(x2) as a function of n and depict it in Figure 2(a 



This shows convergence to as predicted. To obtain a rate of convergence, we compute 



the gradient of a log- log plot of Figure 2(b) We observe the rate of convergence is 
close to 0(?i~^). Note that this is higher than the theoretical bound of 0{n~'^), with 
= 1/2 A /3, given in Equation ( ]30bp : this suggests that our convergence theorems do 
not have sharp rates. 



Finally, we examine the random c'{t,uj') cases. Figure 1(f) shows the empirical 
mean when c'{t; u') is chosen such that 



{c- c'{s;u')) ds = W{t) 



where W{t) is a standard Brownian motion. Theorem 14.31 tells us that the computed 
empirical distribution should have mean close to (u), and this is again the case. 



5.3. Sampling the wave velocity and initial condition 

The objective of this subsection is to show that the problems caused by model error in 
the form of a constant shift to the wave velocity can be overcome by sampling c and Vq. 
We generate data from ( IT2l) . ( !T3|) with c' = c given by (l33|) and initial condition ( l34l) . 
We assume that neither the wave velocity nor the initial condition are known to us, and 
we attempt to recover them from given data. 

The desired conditional distribution is multimodal with respect to c - recall that 
it is non- Gaussian - and care is required to seed the chain close to the desired value in 
order to avoid metastability. Although the algorithm does not have access to the true 
signal v'^, we do have noisy observations of it: y'^. Thus it is natural to choose as initial 
c = c* for the Markov chain the value which minimizes 




(e)/„°°(c~c'(,s))ds = (l/2,l/2) 



(f) /„*(c-c'(s;^')) ds^Wit) 



Figure 1. Figure 1(a) is tlie true initial condition. Figures 1(b) - 1(f) show the desired 
empirical mean of the smoothing P(i;o|i^n = Yn) for Sc = (0,0), (5c = (1/2, 1/2), Sc <E 
X M \ Q, Scdt = (1/2, 1/2) and Sc = W respectively. 
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(a) 



(b) 



Figure 2. Plot 2(a) shows ||E(w„|c,y„ = y,0 

— 'y^||^2C]r2) a function of n, when 
Sc{s) ds = (1/2, 1/2). Its log-log plot, along with a least squares fit, is depicted in 
Plot 2(b)[ demonstrating quadratic convergence. 



Because of the observational noise this estimate is more accurate for small values of k 
and we choose k = (1,0) to estimate ci and k = (0, 1) to estimate C2- 

Figure |3] shows the marginal distribution for c computed with four different values 
of the data size n, in all cases with the Markov chain seeded as in ( l35l) . The results show 
that the marginal wave velocity distribution P(c|y„) converges to a Dirac on the true 
value as the amount of data is increased. Although not shown here, the initial condition 
is also converging to a Dirac on the true value (l33ll in this limit. 

We round-off this subsection by mentioning related published literature. First we 
mention that, in a setting similar to ours, a scheme to approximate the true wave 
velocity is proposed which uses parameter estimation within 3D Var for the linear 
advection equation with constant velocity [9], and its hybrid with the EnKF for the non- 
constant velocity case [10]. These methodologies deal with the problem entirely in finite 
dimensions but are not limited to the linear dynamics. Secondly we note that, although 
a constant wave velocity parameter in the linear advection equation is a useful physical 
idealization in some cases, it is a very rigid assumption, making the data assimilation 
problem with respect to this parameter quite hard; this is manifest in the large number 
of samples required to estimate this constant parameter. A notable, and desirable, 
direction in which to extend this work numerically is to consider the time- dependent 
wave velocity as presented in Theorems I4.1H4.3I For efficient filtering techniques to 
estimate time-dependent parameters, the reader is directed to [391 SOI IHl 112] • 



6. Conclusions 



In this paper, we study an infinite dimensional state estimation problem in the presence 
of model error. For the statistical model of advection equation on a torus, with noisily 
observed functions in discrete time, the large data limit of the filter and the smoother 
both recover the truth in the perfect model scenario. If the actual wave velocity differs 
from the true wave velocity in a time-integrable fashion then the filter recovers the 
truth, but the smoother is in error by a constant phase shift, determined by the integral 
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(c) n = 100 (d) n = 1000 



Figure 3. The marginal distribution of F{c,vo\Yn) with respect to c are depicted 
on the square 1.4 x 10~^ by xlO"**. The red cross marks the true wave velocity 
c = (-0.5,-1.0). 

of the difference in wave velocities. When the difference in wave velocities is constant 
neither filtering nor smoothing recovers the truth in the large data limit. And when 
the difference in wave velocities is a fluctuating random field, however small, neither 
filtering nor smoothing recovers the truth in the large data limit. 

In this paper we consider the dynamics as a hard constraint, and do not allow for 
the addition of mean zero Gaussian noise to the time evolution of the state. Adding 
such noise to the model is sometimes known as a weak constraint approach in the 
data assimilation community and the relative merits of hard and weak constraint 
approaches are widely debated; see [H US] for discussion and references. New techniques 
of analysis would be required to study the weakly constrained problem, because the 
inverse covariance does not evolve linearly as it does for the hard constraint problem we 
study here. We leave this for future study. 
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There are a number of other ways in which the analysis in this paper could be 
generalized, in order to obtain a deeper understanding of filtering methods for high 
dimensional systems. These include: (i) the study of dissipative model dynamics; (ii) the 
study of nonlinear wave propagation problems; (iii) the study of Lagrangian rather than 
Eulerian data. Many other generalizations are also possible. For nonlinear systems, the 
key computational challenge is to find filters which can be justified, either numerically 
or analytically, and which are computationally feasible to implement. There is already 
significant activity in this direction, and studying the effect of model/data mismatch 
will form an important part of the evaluation of these methods. 
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Appendix A. Basic Theorems on Gaussian Measures 

Suppose the probability measure fj, is defined on the Hilbert space H. A function m ETi 
is called the mean of n if, for all £ in the dual space of linear functionals on 7/, 

£{m) = / £{x)fi{dx), 
Jn 

and a linear operator C is called the covariance operator if for all k, £ in the dual space 

k {C£) = I k{x — m)£{x — m)fi{dx). 
Jn 

In particular, a measure /i is called Gaussian if fio£~^ = J\f (m^, aj) for some m^, G M. 
Since the mean and covariance operator completely determine a Gaussian measure, we 
denote a Gaussian measure with mean m and covariance operator C by M{m,C). 

The following lemmas, all of which can be found in [29], summarize the properties 
of Gaussian measures which we require for this paper. 

Lemma A.l. //A/'(0,C) is a Gaussian measure on a Hilbert space Ti, then C is a self- 
adjoint, positive semi-definite nuclear operator on %. Conversely, if m E and C is 
a self-adjoint, positive semi- definite, nuclear operator on %, then there is a Gaussian 
measure n = Af{m, C) onH. 

Lemma A. 2. Let H = "Hi © "^2 be a separable Hilbert space with projections Hi : 
"H — Hiji = 1,2. For an H-valued Gaussian random variable (xi,X2) with mean 
m = (mi, 7712) and positive- definite covariance operator C, denote Cij = IljCII*. Then 
the conditional distribution of xi given X2 is Gaussian with mean 

mi\2 = mi - Ci2C22^(m2 - X2), 
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and covariance operator 

Ci\2 = Cii — Ci2C22^C'21- 

Lemma A. 3. (Feldman-Hajek) Two Gaussian measures fii = N'{mi,Ci), i = 1,2, on a 
Hilbert space Ti are either singular or equivalent. They are equivalent if and only if the 
following three conditions hold: 
(i) Im (CJ^ = Im (cl^ := E; 
(a) mi — m2 G E; 

(Hi) the operator T := (^C^ j (^C^ j — I is Hilbert- Schmidt in E. 

Lemma A. 4. For any two positive-definite, self-adjoint operators Ci, i = 1,2, on a 
Hilbert space H, the condition Im [Ci ) C Im (C2 ) holds if and only if there exists a 



constant K > such that 

{h,Cih) < K {h,C2h) , yhen 

where (■, ■) denotes the inner product on "H. 

Lemma A. 5. For Gaussian X on a Hilbert space % with norm ||-|| and for any integer 
n, there is constant C„ > such that EdlXH^") < C„ (E 



Appendix B. Proof of Limit Theorems 

In this Appendix, we will prove the Limit Theorems I3.2[ [3771 14. ![ where L = CJ , C ^ CJ , 
Lit) 7^ and Theorem 14.31 where 7^ C!{t\uj'\ respectively. In all cases, we use 

the notations e~*^ and e~*^' to denote the forward solution operators through t time 
units (from time zero in the non-autonomous case). We denote by M the putative limit 
for m^. The identity 

n-l 

[nl + rCo^i) (m:, - M) = YC^^ (mo - M) + 

1=0 

+ ^(e*'+i(^-^')n-M), (B.l) 
1=0 

obtained from Equation iHM . will be used to show — M . In Equation fIB.ip . we will 
choose M so that the contribution of the last term is asymptotically negligible. Define 
the Fourier representations 

e„ = - M = ^ enik)(j)k, 

fcGK 

Then M will be any one of u, J^^p^q-^u, (u) and u^- Hence there is Ci independent of k, I 
such that ,^/(0) = and E|(^;(/c)| < Ci\ {u,(j)k) \ with Ci < 00 (the expectation here is 
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trivial except in the case of random £'). Using Equation ( 1T7|) . these Fourier coefficients 
satisfy the relation 

n—l n—1 



7fc 



1=0 1=0 

In order to prove — )■ M in L^(f2'; if *(T^)), we use the monotone convergence 
theorem to obtain the following inequalities, 



S II ||2 

\\^n\\L2(n';H^{T^)) 



J2 |A;|^VE|e„(A;)|2 + n^E|e„ 



1 2s 



n 



+ 2Re<j^^o(fc)E(5^eK^) 



E 



j=o 



+ 



n 



[n + lo/XoY 



70 



Ao 



2s^S-2 



Tfc 



|eo(0)|^ + 7^ri 



|eo(fc)P 



+ 7^n + 2Ci^|eo(A;)||(M,0fc)|n + E 



n-l 



1=0 



+ n' 



5-2 



|eo(0)p + 7^n 



< 



keK+ 



2s^5-2 



Ik 



Xk 



\eo{k)\' + y,n 



+ (^) \eo{k)\'+\iu,(j)k)\']n + E 



n-l 



1=0 



+ n' 



5-2 



^) |eo(0)p + 7^n 



< (c||mo-M||^.,.(^.))n^"2+ (^|A;r^7^ 

VfceK / 



n 



5-1 



< C ^||^o|Ihs+«;(t2) + ll'^ll_f/s+/ 



n-l 



1=0 



<5-2 
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VfcgK 



,5-1 



fcGK 



2 



i=0 



(B.2) 



and here the first two terms in the last equation can be controlled by 
Assumptions 13.11 In order to find 6 G [0, 1] such that this equation is 0(1) or o(l), 



1=0 



(B.3) 



is the key term. This term arises from the model error, i.e., the discrepancy between 
the operator C used in the statistical model and the operator C which generates the 
data. We analyze it, in various cases, in the subsections which follow. 
In order to prove m'^ — )■ M in if*(T^), Q' — a.s., suppose we have 

^ n— 1 

-y"ii(k)^0 n'-a.s., (B.4) 
n ^-^ 



1=0 



for each /c G K. We then use the strong law of large numbers to obtain the following 
inequalities, which holds Q' — a.s., 



-n\\H''{T'2) 



\k\''\en{k)\' + \en 



k&K+ 



2s 



+ 



(7o/Ao)eo(0) + v/7^E: 



=0 VSOJ 



n + 7o/Ao 



< C^(l + |A; 



2s\ 



n 



Ik 2^1=0 ISfc 



n 



n 



fceK 

\ keK J 



(B.5) 



Therefore, using Weierstrass M-test, we have ||e„||^s(x2) ~^ 0) ^ — a-s., once 
Equation (IB. 41) is satisfied. 
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Appendix B.l. Proof of Theorem \3.^ 

This proof is given directly after the theorem statement. For completeness we note that 
Equation fllSap follows from Equation (IB. 21) with 6=1. Once it has been established 
that 

then the proof that 

\\-\\h'={t^) = o (n^^) fi' - a.s., 

for any 6 < 6/2 follows from a Borel-Cantelli argument as shown in the proof of 
Theorem 13.21 We will not repeat this argument for the proofs of Theorems 13.71 14.11 
and I 



Appendix B.2. Proof of Theorem \3. 7 

Appendix B.2.1. When C and At 5c = {p' /p,q' / q), choose M = J^{p,q)U then 

where 5^^^^ is 1 if ki and k2 for k = (/ci, k2) are multiples of p and q respectively, and 
otherwise. Using 



1=0 



(1 - ^m) 



^2TTi{k-5c)ti^ 



< 



1 1 

sin^ I TT I — I — 
p q 



n-l 

sin(n7r(/c ■ 6c) At) 

sm{7i{k ■ 6c) At) 
1 -1 



u,(pk) 

2 



[u,(pk 



the quantity in Equation (IB. 31) becomes 

1 V^^'' 
sm" I 7r I — I — 
p q 



[U,(pk)\ , 



5-2 



SO that from Equation (1B.2P 

|2 



Appendix B. 2. 2. When £ 7^ £' and At (5c G 

ii{k) = e^-'^''-''^''+^{u,(Pk), 
for k G K"^. It is immediate that 

II ll_f/=(T2) ~ 0(1) ^ Cl.S., 



smce 



n— 1 



^2-Ki{k-5c)t, 



= 0(1] 



choose M = (u) then 
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as n — oo, by ergodicity [33] and Equation (lB.4p . Furthermore, when S 
quantity in Equation flB.3|) is bounded by 

2 



28 

0, the 



2s 



^ n— 1 

-E 

n ^ 

1=0 



\{u,<Pk)\'<C\\u\ 



and Weierstrass M-test can be used to show 

II l|2 



o(l). 



Appendix B.3. Proof of Theorem 4-i 



When C{t) 7^ C'{t) and 5c{s) ds = a + 0{t-^), choose M = Ua then 



and we obtain 



1=0 

Now utihzing that 



1=0 ^ ^ 



«=0 



lie. 



when xi = O (/~^) and for /3 G (0, 1/2], Equation (lB.2p gives 

■"llL2(n';//«(T2)) ^ ^||?Tlo||/fs+K(T2) + 1 1 ""I I (T2) j 

VfcgK / 
+ (C2|k||H»(T2))n^-'''=0(l), 



for 5 = 1 A 2/3. 



Appendix B.4- Proof of Theorem \4.3\ 

When £(t) 7^ C'{t; to') and c'(s; to') ds = c{s) ds - eW{t), choose M = (n) then we 
obtain 



and 



for k G K"*". Using 



E 



n-l 



1=0 



'n—l n—1 



, ;=o «'=o 
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n—1 n—1 



we get 



for 6 = 1. 



1=0 l'=0 



n—1 n—1 



n 



n + 



1=1 l'=0 

2 



< 



^-2TT'^e'^\k\'^At{n-l) _ 

g27r2£2|fc|2At _ I \ g27r2£2|fcj2At _ 



n — 1 



^ + g27r2e2Ai _ I 



n 



[U,(Pk) 



g27r2e2At _ I 



n\[u,(pk)\ , 



n-1 



/=0 



< 



g2.2,2^, + 1 

g27r2e2At _ i 



|2 ^S-1 



\U\\ lTs{rp2\ n 0(1, 
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